Deciphering Optimal Radar Ensemble for Advancing Sleep Posture Prediction through Multiview Convolutional Neural Network (MVCNN) Approach Using Spatial Radio Echo Map (SREM)

Assessing sleep posture, a critical component in sleep tests, is crucial for understanding an individual’s sleep quality and identifying potential sleep disorders. However, monitoring sleep posture has traditionally posed significant challenges due to factors such as low light conditions and obstructions like blankets. The use of radar technolsogy could be a potential solution. The objective of this study is to identify the optimal quantity and placement of radar sensors to achieve accurate sleep posture estimation. We invited 70 participants to assume nine different sleep postures under blankets of varying thicknesses. This was conducted in a setting equipped with a baseline of eight radars—three positioned at the headboard and five along the side. We proposed a novel technique for generating radar maps, Spatial Radio Echo Map (SREM), designed specifically for data fusion across multiple radars. Sleep posture estimation was conducted using a Multiview Convolutional Neural Network (MVCNN), which serves as the overarching framework for the comparative evaluation of various deep feature extractors, including ResNet-50, EfficientNet-50, DenseNet-121, PHResNet-50, Attention-50, and Swin Transformer. Among these, DenseNet-121 achieved the highest accuracy, scoring 0.534 and 0.804 for nine-class coarse- and four-class fine-grained classification, respectively. This led to further analysis on the optimal ensemble of radars. For the radars positioned at the head, a single left-located radar proved both essential and sufficient, achieving an accuracy of 0.809. When only one central head radar was used, omitting the central side radar and retaining only the three upper-body radars resulted in accuracies of 0.779 and 0.753, respectively. This study established the foundation for determining the optimal sensor configuration in this application, while also exploring the trade-offs between accuracy and the use of fewer sensors.


Introduction
Sleep posture is one of the essential components in sleep tests and sleep monitoring systems that provide valuable insights into sleep patterns and sleep-related health [1,2].Various health conditions and their treatments have been found to correlate with sleep positions or sleep postures [3,4].Sleep posture is related to the biomechanics of the airway and spine, implicating sleep-related breathing and musculoskeletal disorders [5].For instance, adopting a lateral sleep posture can alleviate the symptoms of sleep apnea [3], while a supine posture may provide relief for individuals suffering from lower back and neck pain [4].Sleep posture also serves as an indicator for sleep quality and sleep ergonomics [6,7].It has been observed that individuals with a poor sleep quality frequently Sensors 2024, 24, 5016 2 of 15 change postures and prefer the supine position [6].In other words, sleep posture serves as a critical link in the complex relationship between sleep health, quality, and ergonomics [8].Comprehensive overnight sleep studies require objective and efficient sleep posture measurements to inform personalized sleep recommendations and interventions.However, traditional sleep studies (i.e., polysomnography) rely on manual observation to identify sleep postures, which is labor-intensive and may be prone to errors [2].To address this challenge, specialized sensors and artificial intelligence for sleep posture measurement and estimation have become increasingly prevalent.These technologies enable the automatic and accurate acquisition of sleep posture, thus enhancing the precision and efficiency of sleep studies.
Contact and non-contact sensors are two major categories in sleep posture recognition technologies.Numerous studies have applied pressure sensors and wearable devices to estimate sleep posture.Pressure mapping technologies, which integrate sensitive conductive sheets into mattresses or bedsheets, can identify different sleep postures based on changes in the body's interfacial pressure patterns [9,10].However, they might be costly, have limited availability, be influenced by specific mattresses [11,12], and require regular maintenance and cleaning.Furthermore, wearable sleep technologies, also known as actigraphy, incorporate various sensors to measure both biophysical signals and sleep postures [13][14][15], in addition to sleep stage classification.They can also be readily usable at home [16,17].Accelerometers within these wearable devices can be attached to the body to identify sleep postures and track posture changes [18][19][20].Nevertheless, despite the lack of clear evidence on this issue [21], it is believed that some users, especially older people or those with emotion problems, may find actigraphy uncomfortable to wear and difficult to comply with [22].On the other hand, machine learning and deep learning models, particularly support vector machines (SVMs) and convolutional neural networks (CNNs), have been applied to facilitate sleep posture estimation with these devices [2,23].
Non-contact methods utilize optical sensors, particularly video cameras and computer vision systems, to estimate sleep posture [14,15,24].It is becoming increasingly common to independently utilize depth or infrared cameras or use them to complement traditional video cameras for sleep posture estimation.Their strength lies in the ability to function in night-time conditions and protect privacy [25][26][27][28][29][30], and they are also used to monitor bed-exiting events [31,32].Another significant advantage of depth cameras is their ability to estimate sleep posture under-blanket by assessing the depths of the images, which optical cameras cannot perform [27][28][29]33].These techniques are often combined with machine or deep learning models, such as CNNs and SVMs [27][28][29], to automate the process.In particular, Tam et al. [28] proposed an intraclass mix-up technique to generalize blanket conditions, and efforts have been made to estimate the joint coordinates in sleep postures [28,34].
Radar technology presents another alternative non-contact method for sleep posture estimation.It combines the advantages of depth cameras and requires even less exposure and visuals for accurate estimation [35].Several studies have explored the potential of radar technology for sleep posture recognition along with machine or deep learning models.Higashi, et al. [36] achieved an accuracy of 88% using 24 GHz Doppler radar data and machine learning for sleep posture recognition.BodyCompass, a system developed by Yue, et al. [37], integrates FMCW radar with a sweeping frequency ranging from 5.4 GHz to 7.2 GHz to determine patient posture, achieving an accuracy of up to 94%.Kiriazi, et al. [38] investigated sleep posture estimation using a dual-frequency Doppler radar emitting 2.4 GHz and 5.8 GHz waves mounted on the ceiling above the bed to monitor torso reflections and movement.However, the use of continuous wave radar for indoor sleep monitoring might be limited due to potential multipath interference [39].To address this challenge, Piriyajitakonkij, et al. [40] proposed a method that utilizes data from both temporal and spectral domains, enhancing IR-UWB signal detection for the recognition of four sleep transitional postures.Sleep posture recognition can be considered a fine-grained classification problem, which can be effectively addressed using multimodal data fusion or multiview data fusion approaches.These methods integrate diverse data sources or multiple views of the same data to enhance the discriminative power of classification models.For example, Khaire, Imran, and Kumar [41] demonstrated that fusing RGB-D and skeletal data to improve human activities classification by providing complementary information enhanced feature representation.Similarly, Zhu and Liu [42] employed multiview attention to combine visual and optical flow data for fine-grained action recognition, achieving a superior accuracy compared to single-modality approaches.XIE et al. [43] utilized co-located IR-UWB radar and depth sensors for fine-grained activity recognition and tracking in a domestic setting.These advancements highlight the potential of data fusion approaches to tackle the challenges associated with fine-grained classification by leveraging the strengths of multiple data modalities.
The number and placement of sensors are important factors in the performance of sleep posture estimation.While intuitively increasing the number of sensors and diversifying their positions could enhance performance, this would introduce additional costs and complexities into the experimental setup.Moreover, it could potentially burden the model for posture estimation, since it would need more computing resources to process more sensor data.
The research gap lies in the insufficient understanding of the minimal sensor configuration and placement strategy required to achieve accurate results.Existing studies often presume placing the radar in the center as a rule of thumb and focus on optimizing predictions based on this configuration.Our research question involves determining the minimum number of radar sensors and their positions to achieve the best performance in sleep posture estimation.Our previous studies have explored multiple radar configurations, including dual (examining single-radar settings and dual-radar settings) and triple settings (examining the influence of top, head, and side radar placement combinations) [44,45].In this study, we aim to evaluate the performance of sleep posture estimation using different arrangements and combinations of eight radar sensors.Another innovation of this study is the proposal of a data fusion technique for the multiple radar configuration.This technique is designed to facilitate more efficient processing by deep learning models and to improve the effectiveness of sleep posture estimation.We first applied deep learning models, including ResNet, EfficientNet, DenseNet, PHResNet, Residual attention network (attention-56), and Swin Transformer, to the data of all radar sensors.Subsequently, we experimented with different arrangements and combinations of the radar sensors for the model that demonstrated the best performance.The main contributions of this study include, as follows:

•
Incorporating a Multiview Convolutional Neural Network (MVCNN) architecture to leverage deep feature extractors for precise sleep posture estimation.

•
Introducing Spatial Radar Echo Maps (SREMs) to enhance radar-based sleep posture prediction.

•
Identifying the optimal radar sensor configuration for an improved posture estimation accuracy.

System Setup
This study employed the Impulse-radio ultra-wideband (IR-UWB) radar.IR-UWB radars transmit short-duration impulse signals using a transmitter.When the emitted radar pulse encounters an object, the transmitted pulse partially penetrates the object, while the remainder is reflected back by the receiver.The Time of Arrival (TOA) of the reflected pulse Sensors 2024, 24, 5016 4 of 15 is measured to determine the distance between the target and the radar.Mathematically, the received signal can be expressed as the formula in Equation (1).
where P is the number of multipaths, A i is the amplitudes associated with the multipaths, τ i is the time delay of the multipath components, and n(t) represents the noise captured from random variations and disturbances in the channel.We utilized eight IR-UWB radar sensors, which are integrated system-on-chips (Xethru X4M03 v5 from Novelda, Oslo, Norway) operating at a center frequency of 7.29 GHz and a bandwidth of 1.4 GHz.They comprise two key components: a programmable controller and an antenna.On the receiving end, the system boasts a high sampling rate of 23.328 GS/s and a total radar frame length of 9.87 m.This represents a distance resolution of 0.00643 m between each data point received by the radar.Additionally, the receiver maintains a sufficient gain of 14.1 dB and a low noise figure of 6.7 dB.Both the elevation and azimuth angles of the radars spanned a wide range from −65 • to +65 • .Table 1 shows the parameters used in this study.Notably, the detection range of the radars was set to encompass the area of interest (RoI) of our study.

Radar Placement
The radars were positioned around a bed of 196 cm × 90 cm × 60 cm (length, width, and height).Five radars were positioned on the side of the bed, 78 cm from the ground, shooting on the body, torso, and limbs at 15 cm interval spacings.They are abbreviated as S 1 , S 2 , S 3 , S 4 , and S 5 , from cranial to caudal.Three radars were placed at the headboard side of the bed, 108 cm from the ground, spanning over the shoulders and the head, which were positioned higher than the side radars to avoid blockage of the headboard.They are abbreviated as H L , H C , and H R , corresponding to left, central, and right, respectively.To ensure compatibility with scenarios where participants might extend their limbs beyond the edges of the bed and accidentally hit the radars, all radars were positioned 20 cm from the edges with the aforementioned height parameters for practicality.We first assessed the impact of the head radars.Subsequently, we decided to maintain the use of the central radar due to its extensive coverage and alignment with our existing study.Following this, we evaluated the influence of the side radars.The configuration of the sensor placements, along with their labels, is illustrated in Figure 1.

Experiment Protocol and Data Collection
A total of 70 adults (39 males and 31 females) were recruited from a university to participate in this experiment.The inclusion criteria were adults aged over 18.The exclusion criteria were people with the absence of any limbs or pregnancy.People who had difficulty staying in or positioned in a specific position in bed were also excluded.The study was approved by the Institutional Review Board of The Hong Kong Polytechnic University (Reference Number: HSEARS20210127007).Before the experiment began, all participants received a thorough explanation of the procedures, both orally and in writing.Informed consent was obtained from all subjects involved in the study.
The average age of the enrolled participants was 26.3 years (standard deviation: 11.3 years, ranging from 18 to 67).Their average height was 168.2 cm (standard deviation: 8.32 cm, ranging from 150 cm to 186 cm) and their average weight was 64.0 kg (standard deviation: 12.3 kg, ranging from 43 kg to 108 kg).The average BMI was 22.6 (standard deviation: 4.03, ranging from 16.3 to 43.8) During the experiment, the participants removed all metal-containing clothing or accessories (e.g., belts), shoes, and outerwear.They were instructed to start in the supine position on a bed with a pillow.Then, they were asked to position themselves in nine sleeping postures sequentially, as shown in Figure 2.

Experiment Protocol and Data Collection
A total of 70 adults (39 males and 31 females) were recruited from a university to participate in this experiment.The inclusion criteria were adults aged over 18.The exclusion criteria were people with the absence of any limbs or pregnancy.People who had difficulty staying in or positioned in a specific position in bed were also excluded.The study was approved by the Institutional Review Board of The Hong Kong Polytechnic University (Reference Number: HSEARS20210127007).Before the experiment began, all participants received a thorough explanation of the procedures, both orally and in writing.Informed consent was obtained from all subjects involved in the study.
The average age of the enrolled participants was 26.3 years (standard deviation: 11.3 years, ranging from 18 to 67).Their average height was 168.2 cm (standard deviation: 8.32 cm, ranging from 150 cm to 186 cm) and their average weight was 64.0 kg (standard deviation: 12.3 kg, ranging from 43 kg to 108 kg).The average BMI was 22.6 (standard deviation: 4.03, ranging from 16.3 to 43.8) During the experiment, the participants removed all metal-containing clothing or accessories (e.g., belts), shoes, and outerwear.They were instructed to start in the supine position on a bed with a pillow.Then, they were asked to position themselves in nine sleeping postures sequentially, as shown in Figure 2.  When the participants were instructed to perform a specific posture, they could decide to position their limbs and bodies in a manner they found comfortable, as long as it adhered to the defined instructions for the postures.Once the participants confirmed their posture, they were required to remain stationary.The researchers then proceeded to sequentially drape blankets over the participants, ranging from thick to thin.After each blanket was positioned, the researchers paused for five seconds to allow for ambient recording time.After all blanket conditions were tested, a bell signaled the transition to the next posture, and this cycle continued until all nine postures were tested in the three blanket conditions (note: the null blanket condition was not included in our analysis.Figure 2 is just for illustration).The entire process was repeated three times, resulting in three repeated trials.In total, the experiment yielded 5670 data samples (70 participants × 9 postures × 3 blanket conditions × 3 trials) that were manually labeled.

Spatial Radar Echo Map (SREM)
A typical IR-UWB radar data frame is a 2D matrix where each row corresponds to a different radar pulse, capturing the temporal evolution of the scene over time, and each column corresponds to a sample point within a single radar pulse, capturing high-resolution distance information.For each radar sensor, we extracted a data frame at a specific time instance and performed noise cancellation using clutter suppression, achieved through a mean subtraction method as illustrated in Equation (2) [40]: where X is the radar frame with the radar bin n and time m, and N denotes the total number of radar bins.
For each radar frame, the Radar Echo Map Generation algorithm (Algorithm 1) computed the distance from the radar location to every grid point within the predetermined map limits.We then initialized a 2D grid base on the size of the bed.Using the calculated When the participants were instructed to perform a specific posture, they could decide to position their limbs and bodies in a manner they found comfortable, as long as it adhered to the defined instructions for the postures.Once the participants confirmed their posture, they were required to remain stationary.The researchers then proceeded to sequentially drape blankets over the participants, ranging from thick to thin.After each blanket was positioned, the researchers paused for five seconds to allow for ambient recording time.After all blanket conditions were tested, a bell signaled the transition to the next posture, and this cycle continued until all nine postures were tested in the three blanket conditions (note: the null blanket condition was not included in our analysis.Figure 2 is just for illustration).The entire process was repeated three times, resulting in three repeated trials.In total, the experiment yielded 5670 data samples (70 participants × 9 postures × 3 blanket conditions × 3 trials) that were manually labeled.

Spatial Radar Echo Map (SREM)
A typical IR-UWB radar data frame is a 2D matrix where each row corresponds to a different radar pulse, capturing the temporal evolution of the scene over time, and each column corresponds to a sample point within a single radar pulse, capturing highresolution distance information.For each radar sensor, we extracted a data frame at a specific time instance and performed noise cancellation using clutter suppression, achieved through a mean subtraction method as illustrated in Equation (2) [40]: where X is the radar frame with the radar bin n and time m, and N denotes the total number of radar bins.
For each radar frame, the Radar Echo Map Generation algorithm (Algorithm 1) computed the distance from the radar location to every grid point within the predetermined map limits.We then initialized a 2D grid base on the size of the bed.Using the calculated distances, the algorithm identified the nearest radar bins for each grid point and employed interpolation techniques to estimate the radar reflectivity intensity at the specific location.

Algorithm 1. Radar Echo Map Generation
Input: Radar R i defined as arrays of intensity of all radar bins Output: Two dimensional intensity Map Q distributed on the bed Initialisation: 1: x 0 y 0 = (0, 0 ) 2: x N y N = (90, for n in x 0 , . . ., x N do 10: for m in y 0 , . . ., y M do 11: end for 16: end for 17: end for 18: return Q[n][m] d: the distance between each radar bin; N: the number of xbins mapped to short edge of bed; M: the number of ybins mapped to long edge of bed; b: radar bin number. The objective of interpolation is to facilitate the spatial distribution of radar reflectivity values, contingent upon their proximity to the radar source.Upon iterating over all grid points within the defined map boundaries, the radar data can be registered and mapped for a spatial representation.Figure 3 shows the Spatial Radar Echo Maps (SREMs) which were generated from the eight radar sensors, each disclosing a sector-shaped coverage area.This sector shape emerges due to the effective azimuth angle of the radar, spanning from −65 to +65 degrees.It is presumed that the radar signals significantly diminished beyond this azimuth range.
distances, the algorithm identified the nearest radar bins for each grid point and employed interpolation techniques to estimate the radar reflectivity intensity at the specific location.The objective of interpolation is to facilitate the spatial distribution of radar reflectivity values, contingent upon their proximity to the radar source.Upon iterating over all grid points within the defined map boundaries, the radar data can be registered and mapped for a spatial representation.Figure 3 shows the Spatial Radar Echo Maps (SREMs) which were generated from the eight radar sensors, each disclosing a sector-shaped coverage area.This sector shape emerges due to the effective azimuth angle of the radar, spanning from −65 to +65 degrees.It is presumed that the radar signals significantly diminished beyond this azimuth range.
Upon initialization, the algorithm A1 established the map's boundaries (start and end positions in centimeters), the distance represented by each data bin from the radar, and the number of bins in both the horizontal and vertical directions.It then created stacks of 2D arrays, where each element corresponded to a specific location within the map.Upon initialization, the algorithm A1 established the map's boundaries (start and end positions in centimeters), the distance represented by each data bin from the radar, and the number of bins in both the horizontal and vertical directions.It then created stacks of 2D arrays, where each element corresponded to a specific location within the map.

Model Training
We utilized the Multiview Convolutional Neural Network (MVCNN) approach, which was originally used to project a 3D object into multiple 2D images captured from various perspectives [47].It involved a deep feature extractor on the radar generation maps from each radar, followed by a view pooling operation across all views, and then through fully connected layers for a final classification, as illustrated in Figure 4.In this study, we evaluated the use of ResNet-50 [48], EfficientNet-B0 [49], DenseNet-121 [50], PHResNet-50 [51], residual attention network (Attention-56) [52], and Swin Transformer [53] as the deep feature extractors.PHResNet-50 (Parametrized-Hypercomplex ResNet) is one of the cutting-edge models that facilitates hypercomplex learning for multiview data.The hyperparameters remained at their default values.

Model Training
We utilized the Multiview Convolutional Neural Network (MVCNN) approach, which was originally used to project a 3D object into multiple 2D images captured from various perspectives [47].It involved a deep feature extractor on the radar generation maps from each radar, followed by a view pooling operation across all views, and then through fully connected layers for a final classification, as illustrated in Figure 4.In this study, we evaluated the use of ResNet-50 [48], EfficientNet-B0 [49], DenseNet-121 [50], PHResNet-50 [51], residual attention network (Attention-56) [52], and Swin Transformer [53] as the deep feature extractors.PHResNet-50 (Parametrized-Hypercomplex ResNet) is one of the cutting-edge models that facilitates hypercomplex learning for multiview data.
The hyperparameters remained at their default values.The data were split into training and testing set at a 55:15 ratio.Specifically, data from 55 randomly selected participants were used for the model training, while the data of the remaining 15 participants were used for the model testing.Cross-entropy, which acts as the loss function of the model, guides the network to adjust its internal weights to minimize classification errors.We adopted the AdamW, a variant of the Adam optimizer that accounts for the decoupled weight decay regularization.The learning rate was set to 0.001.The betas parameter was a tuple of two values (0.9, 0.999).

Evaluation and Analysis
The performances of the models were evaluated using the accuracy measure, which is defined as the ratio of correct predictions to the number of cases in the testing set.In addition to the full analysis (i.e., 9-class classification), we also evaluated 4-class coarsegrained classification to provide more insights on the performances of the models.This involved categorizing the nine original classes into four coarse categories: supine, left, right, and prone.The four-class classification model was then trained and evaluated independently.The categorization of the nine classes was as follows: (1) Supine: S; The data were split into training and testing set at a 55:15 ratio.Specifically, data from 55 randomly selected participants were used for the model training, while the data of the remaining 15 participants were used for the model testing.Cross-entropy, which acts as the loss function of the model, guides the network to adjust its internal weights to minimize classification errors.We adopted the AdamW, a variant of the Adam optimizer that accounts for the decoupled weight decay regularization.The learning rate was set to 0.001.The betas parameter was a tuple of two values (0.9, 0.999).

Evaluation and Analysis
The performances of the models were evaluated using the accuracy measure, which is defined as the ratio of correct predictions to the number of cases in the testing set.In addition to the full analysis (i.e., 9-class classification), we also evaluated 4-class coarsegrained classification to provide more insights on the performances of the models.This involved categorizing the nine original classes into four coarse categories: supine, left, right, and prone.The four-class classification model was then trained and evaluated independently.The categorization of the nine classes was as follows: (1) Supine: S; Once the optimal model was identified, we retrained and retested it using data from various numbers and placements of radar sensors.However, it is important to note that we did not explore all possible combinations of radar sensor numbers and placements.Instead, we pre-planned several combinations and quantities based on specific premises, which are detailed in Section 3. In total, we experimented with 22 different settings involving various numbers and combinations of radar sensors.We decided to use 4-class classification scheme for the radar configurations' evaluation, since this approach offers a more interpretable means to understand which radar configurations contributed more significantly to the model performance.

Performance of Deep Learning Models
As shown in Table 2, DenseNet-121 consistently outperformed the others with accuracies of 0.534, 0.714, and 0.804, for the nine-class and four-class classifications, respectively.EfficientNet-B0 also demonstrated a competitive performance, achieving an accuracy of 0.775 for the four-class classification.Attention-56 managed to achieve an accuracy of 0.469 in the coarse-grained classification, but it failed to converge in the fine-grained classification.Unfortunately, the Swin Transformer model did not converge in any of the classification tasks.

Performance of Different Radar Arrangements and Placements
DenseNet-121, ResNet-50, EfficientNet-B0, and PHResNet-50 were selected for further analysis of the radar arrangements and placements.Table 3 shows the impacts of varying radar configurations on the accuracy of the four-class posture classification.The baseline configuration (#1) on all eight radars achieved an accuracy of 0.804.This performance did not weaken much when removing one and two radars (#2 to #7), showing accuracies of 0.794 and 0.771 for DenseNet-121, respectively.When only six radars were retained (#5 to #7), the placement of the radars played an important role.Interestingly, the performance of a specific configuration (#5) surpassed the baseline with an accuracy of 0.809 for DenseNet-121, while that of #7 was very near to the baseline with an accuracy of 0.803.This finding showed that the H L radar played an important role in the model performance.The variations in this performance became greater when the number of radars were further reduced.
Table 4 compares the average accuracies across the various models with different numbers of radars.When only six radars were used, DenseNet-121 revealed an average performance of 0.799, which is comparable to the baseline configuration with eight radars.Although ResNet-50 generally outperformed DenseNet-121 across the other radar configurations, DenseNet-121 consistently showed an optimal performance, indicating its suitability for adoption.

Discussion
The objective of this study was to determine the ideal quantity and positioning of radar sensors for sleep posture estimation, thereby laying the groundwork for the optimal sensor configuration in this application.We incorporated eight radars into the baseline setup, with three positioned at the headboard and five along the side.In order to accommodate the multiple radar sensors, we introduced an innovative data fusion method for generating radar maps, the Spatial Radar Echo Map (SREM), and ingeniously utilized the Multi-View Convolutional Neural Network (MVCNN).
Multimodal data fusion has attracted significant attention in recent studies.The integration of diverse data sources may enhance the predictive accuracy and robustness in various applications, specifically for situations where time series data are the major sensory data type [54,55].We employed a data fusion approach in this study, since our study utilized multiple IR-UWB radars as the primary devices for sleep posture recognition.
Moreover, we employed sensor removal to isolate the influences of individual radars within the chosen model.We opted to focus on the best-performing model (DenseNet-121), since this enables a more precise attribution of performance variations to the removed sensors.
In regard to sensor placement, for the head radars, positioning a single one on the left was both crucial and adequate.A lack of all head radars led to a significant decrease in prediction accuracy.However, adding more radars could potentially diminish this accuracy slightly.This could be attributed to the possibility that extra radars at the shoulder may not provide informative data, but rather contribute to noise.Nevertheless, we decided to maintain the central radar because of its better exposure and alignment with our existing study.
Increasing the number of side radars generally improved the prediction accuracy.This could be attributed to the fact that all radars were essential for identifying the fine-grained features in postures, such as limb placement, which helps to distinguish between postures like the log, fetal, and half-stomach positions.If we aim to limit the number of side radars to four or three, the optimal configuration involves removing the central radars for the head edge and retaining those focused on the upper-body regions for the side edge.It appears that radars targeting the upper-body region are more effective in estimating sleep postures in general.We initially hypothesized that the baseline configuration would yield the highest accuracy.However, configurations (#6) and (#8) demonstrated comparable accuracies, despite the removal of two head radars.This finding suggests that there could be the presence of a ceiling effect when an adequate number of side radars are employed.In our setup, all side radars were placed equidistantly to ensure uniform exposure, which may have contributed to this ceiling effect.Future research should explore not only the number of radars used, but also the interval of their placement.
We compared the performance of our system to that of existing studies (Table 5).Zhou et al. [56] utilized an FMCW radar system with a CNN incorporating an Inception-Residual module across eight sleep postures, with overall accuracy of 87.2%.Piriyajitakonkij et al. [40] employed the Xethru X4M03 radar and SleepPoseNet, achieving an accuracy of 73.7 ± 0.8% across four sleep postures.Islam and Lubecke [57] used a dual-frequency monostatic CW radar with multiple classifiers (KNN, SVM, and Decision Tree) and reported an accuracy of 98.4% for this dual frequency.Adhikari [58] used a Texas Instrument IWR1443 radar with the Rest Network, a customized CNN, achieving an 80.8% accuracy across five postures without blankets.Our previous study [44] utilized the spatial-temporal features of continuous radar frames, employing various models, including the Swin Transformer with the Xethru X4M03 radar, and achieving up to an 80.8% accuracy for four sleep postures with blankets.In this study, we decided to utilize the spatial features of single radar frame, which could enable real-time application.Using the Xethru X4M03 radar and DenseNet121 model, we classified four sleep postures with three blanket conditions, achieving the highest accuracy of 80.9%.This indicates that our approach is comparable to or slightly better than previous results, despite the additional complexity of three blankets.
There were some limitations in this study.While our proposed data fusion technique using a radar generation map reinforced the presentation of spatial information, temporal information might also be useful in estimating sleep postures by their reasonable transitions.The quasi-periodic oscillations in radar signals contributed by vital signs might facilitate attention to the torso region and improve the performance of posture estimation [44].The 2024, 24, 5016 12 of 15 constraint of data size was another limitation.Deep learning models generally require substantial amounts of data to achieve the optimal performance and model convergence, especially those using complicated models.In our study, we observed that the Swin Transformer did not converge in both the finegrained and coarse-grained classifications, while Attention-56 did not converge in the finegrained classification and underperformed in the coarse-grained classification.Both models belong to the class of attention-based models, which are fundamentally different from convolutional networks.Convolutional networks primarily focus on local surrounding spatial features through filters.However, implementing the attention mechanism also comes with a trade-off: a significant increase in the number of parameters (Table 2).This, in turn, necessitates training with more data, especially data rich in latent information, for the attention module to effectively capture these subtle relationships.However, if the dataset lacks sufficient non-local features or contains repetitive long-range features, the Transformer model may fail to converge.In our study, the signature of the feature in the radar spatial map was localized, indicating a lack of non-local relationships.This could be a potential reason for the non-convergence of the models.Furthermore, our radar map represents a single instant without any time features.This means we could not track the movements of individuals over time to facilitate the attention mechanism.If there was a time domain, it might introduce some non-local time features that could potentially aid the convergence of Transformer class models.
Prior research on sleep posture recognition has often prioritized the expansion of recognized postures, in addition to the presence of blankets.However, the orientation and covering style of these blankets are important, yet frequently neglected, factors that might influence accurate classification.Our study prioritizes real-world applicability by acknowledging the variability in self-covering behaviors during sleep.To address this, we will incorporate scenarios with diverse blanket orientations and covering methods as part of our external testing procedures.We posit that this inclusion will enhance the generalizability of our proposed sleep posture recognition model.

Conclusions
This study identified the optimal combination of radar quantity and placement, starting with eight radar sensors, three at the headboard and five along the side.The left head

Figure 1 .
Figure 1.Radar placement around the bed.S1-S5 denote radar sensors arranged from cranial (S1) to caudal direction.HL, HC, and HR denote radar positions at the left, center, and right of the headboard.

Figure 1 .
Figure 1.Radar placement around the bed.S1-S5 denote radar sensors arranged from cranial (S1) to caudal direction.HL, HC, and HR denote radar positions at the left, center, and right of the headboard.

Figure 2 .
Figure 2. Illustration of the nine sleep postures with three blanket conditions (thick, medium, and thin).The postures are: supine (S); left lateral side lying with both legs extended (L.Log); left lateral side lying at a half-stomach position (L.Sto); left lateral side lying at a fetal position (L.Fet); right lateral side lying (R. Log); right lateral side lying at a half-stomach position (R.Sto); right lateral side lying at a fetal position (R.Fet.);prone position with head turned left (L.Pr.); and prone position with head turned right (R.Pr.).The no-blanket condition is displayed for illustration and not included in the dataset.

Figure 2 .
Figure 2. Illustration of the nine sleep postures with three blanket conditions (thick, medium, and thin).The postures are: supine (S); left lateral side lying with both legs extended (L.Log); left lateral side lying at a half-stomach position (L.Sto); left lateral side lying at a fetal position (L.Fet); right lateral side lying (R. Log); right lateral side lying at a half-stomach position (R.Sto); right lateral side lying at a fetal position (R.Fet.); prone position with head turned left (L.Pr.); and prone position with head turned right (R. Pr.).The no-blanket condition is displayed for illustration and not included in the dataset.

Figure 3 .
Figure 3.An illustration of Spatial Radar Echo Maps (SREMs) in all radars.Figure 3.An illustration of Spatial Radar Echo Maps (SREMs) in all radars.

Figure 3 .
Figure 3.An illustration of Spatial Radar Echo Maps (SREMs) in all radars.Figure 3.An illustration of Spatial Radar Echo Maps (SREMs) in all radars.

Figure 4 .
Figure 4. Model architecture of MWCNN with different feature extractors.

Figure 4 .
Figure 4. Model architecture of MWCNN with different feature extractors.

Algorithm 1 .
Radar Echo Map GenerationInput: Radar frame  defined as arrays of intensity of all radar bins Output: Two dimensional intensity Map Q distributed on the bed distance between each radar bin; N: the number of xbins mapped to short edge of bed; M: the number of ybins mapped to long edge of bed; b: radar bin number.

Table 2 .
Accuracy of different deep learning models as deep feature extractor in 9-class and 4-class sleep posture classification, and the number of parameters of each model.
NC: model did not converge.

1 S 2 S 3 S 4 S 5 H L H C H R DenseNet121 ResNet-50
: number of radar sensors.S 1-5 denote radar sensors arranged from cranial (S 1 ) to caudal direction.H L,C,R denote radar positions at the left, center, and right of the headboard.#1-#22 denote configuration number 1-number 22. × indicates the radar to retain. N

Table 5 .
Comparison of accuracy performance with existing studies.
CNN: convolutional neural network; MW: Multiview; N b : Number of blanket conditions; N p : Number of participants; N s : Number of sleep postures to be classified; w/: with.